14 research outputs found
A kĂ©szĂĽlĹ‘ AkadĂ©miai nagyszĂłtár számĂtĂłgĂ©pes vonatkozásai
The project for the Academic Dictionary of Hungarian is presented from computational point of view. The major steps are the following: collection of the 25 million running word Historical Corpus of Hungarian, lemmatization, disambiguation, user friendly retrieval interface (www.nvtud.hu/hhcl. frequency database of the entries, on-line compilation of the dictionary entries with the XML module of the Corel Office 2000 WordPerfect 9 program. Presentation of the TEI based DTD of the dictionary
Az Európai Médiafigyelő (EMM) magyar változata
A Közös KutatĂłközpont – Europa (European Joint Research Centre) által fejlesztett eurĂłpai mĂ©diafigyel (http://emm.newsbrief.eu) világszerte több ezer hĂrportálrĂłl automatikusan gyjti, Ă©s kĂĽlönfĂ©le kategĂłriákba sorolja a hĂreket, a nap 24 Ăłrájában, 10 percenkĂ©nt frissĂtve, nyelvtechnolĂłgia eszköztár használatával. Az MTA Nyelvtudományi IntĂ©zet NyelvtechnolĂłgiai KutatĂłcsoportja egyĂĽttmködĂ©si megállapodás keretĂ©ben a szolgáltatás magyar nyelv mködĂ©sĂ©t tette lehetvĂ©. A magyar tulajdonneveknek az EMM rendszeren belĂĽli felismerĂ©se Ă©s a toldalĂ©kolt változatok kezelĂ©se volt az elsdleges feladat. A nemzetközi jelentsĂ©g hĂreket valamennyi feldolgozott nyelvi változatukban elĂ©rhetjĂĽk
Media monitoring and information extraction for the highly inflected agglutinative language Hungarian
The Europe Media Monitor (EMM) is a fully-automatic system that analyses written online news by gathering articles in over 70 languages and by applying text analysis software for currently 21 languages, without using linguistic tools such as parsers, part-of-speech taggers or morphological analysers. In this paper, we describe the effort of adding to EMM Hungarian text mining tools for news gathering; document categorisation; named entity recognition and classification for persons, organisations and locations; name lemmatisation; quotation recognition; and cross-lingual linking of related news clusters. The major challenge of dealing with the Hungarian language is its high degree of inflection and agglutination. We present several experiments where we apply linguistically light-weight methods to deal with inflection and we propose a method to overcome the challenges. We also present detailed frequency lists of Hungarian person and location name suffixes, as found in real-life news texts. This empirical data can be used to draw further conclusions and to improve existing Named Entity Recognition software. Within EMM, the solutions described here will also be applied to other morphologically complex languages such as those of the Slavic language family. The media monitoring and analysis system EMM is freely accessible online via the web pag
Akadémiai nagyszótár = The Historical Dictionary of Hungarian
Az elmĂşlt nĂ©gy Ă©v folyamán, az elĹ‘zetes terveknek megfelelĹ‘en elkĂ©szĂĽlt Ă©s megjelent a NagyszĂłtár elsĹ‘ kĂ©t kötete (www.nytud.hu/publ/nszt) 1119 Ă©s 1550 lapon. A publikált elsĹ‘ kötet tartalmazza a szĂłtár forrásanyagának teljes bibliográfiáját Ă©s az egyĂ©b segĂ©dleteket: az Elekfi LászlĂł által kĂ©szĂtett ragozási szĂłtár kĂłdjainak rĂ©szletes feloldását. A második kötet az A-AzsĂşroz cĂmszavak közötti szóállományt tartalmazza. A publikált köteteken tĂşlmenĹ‘en kĂ©ziratban legalább elsĹ‘ változatban elkĂ©szĂĽltek már a B Ă©s C betűs szĂłcikkek is, ezek szerkesztĂ©se, javĂtása folyamatban van. Az OTKA-keretbĹ‘l foglalkoztatott Varga Éva Katalin a szĂłcikkĂráson kĂvĂĽl rĂ©szt vett a forrásjegyzĂ©k tĂ©teleinek filolĂłgiai ellenĹ‘rzĂ©sĂ©ben Ă©s javĂtásában. | The first two volumes of the Dictionary of the Hungarian Language have been published (on 1119 and 1550 pages, respectively) according to the objectives outlined in the grant proposal, cf. www.nytud.hu/publ/nszt. The first volume of contains the full bibliography and references of the sources of the Dictionary, as well as all other auxiliary materials, including the dictionary of inflections as compiled by LászlĂł Elekfi. Volume II contains all entries between A and AzsĂşroz. In addition to the two volumes published, the frist versions of the entries beginning with the letters B and C have also been written, and their editing and philological supervision is under way. Katalin Éva Varga, whose employment was financed by the present grant took part in the preparation of dictionary entries as well as in the supervision and correction of the bibliographical items in the lists of sources